This Notebook presents some preliminary and exploratory Data Analysis on the CardioGoodFitness dataset. The following functions are used to explore the dataset and extract basic observations about the data.
At the end of this exercise, I'll generate a set of insights and recommendations that will help the company in targeting new customers
This Exploratory Data Analysis will be divided into 3 sections:
The following command installs the latest version of pandas-profiling, a python library used to do basic exploratory data analysis (EDA).
Note: the code before the pip command makes certain that I am running the pip version associated with the current Python kernel.
import sys
!{sys.executable} -m pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
Collecting https://github.com/pandas-profiling/pandas-profiling/archive/master.zip Using cached https://github.com/pandas-profiling/pandas-profiling/archive/master.zip Requirement already satisfied (use --upgrade to upgrade): pandas-profiling==2.12.0 from https://github.com/pandas-profiling/pandas-profiling/archive/master.zip in c:\programdata\anaconda3\lib\site-packages Requirement already satisfied: joblib in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (0.17.0) Requirement already satisfied: scipy>=1.4.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (1.5.2) Requirement already satisfied: pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (1.1.3) Requirement already satisfied: matplotlib>=3.2.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (3.3.2) Requirement already satisfied: confuse>=1.0.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (1.4.0) Requirement already satisfied: jinja2>=2.11.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (2.11.2) Requirement already satisfied: visions[type_image_path]==0.6.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (0.6.0) Requirement already satisfied: numpy>=1.16.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (1.19.2) Requirement already satisfied: attrs>=19.3.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (20.3.0) Requirement already satisfied: htmlmin>=0.1.12 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (0.1.12) Requirement already satisfied: missingno>=0.4.2 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (0.4.2) Requirement already satisfied: phik>=0.10.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (0.11.2) Requirement already satisfied: tangled-up-in-unicode>=0.0.6 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (0.0.7) Requirement already satisfied: requests>=2.24.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (2.24.0) Requirement already satisfied: tqdm>=4.48.2 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (4.50.2) Requirement already satisfied: seaborn>=0.10.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling==2.12.0) (0.11.0) Requirement already satisfied: pytz>=2017.2 in c:\programdata\anaconda3\lib\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling==2.12.0) (2020.1) Requirement already satisfied: python-dateutil>=2.7.3 in c:\programdata\anaconda3\lib\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling==2.12.0) (2.8.1) Requirement already satisfied: pillow>=6.2.0 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling==2.12.0) (8.0.1) Requirement already satisfied: certifi>=2020.06.20 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling==2.12.0) (2020.6.20) Requirement already satisfied: kiwisolver>=1.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling==2.12.0) (1.3.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling==2.12.0) (2.4.7) Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling==2.12.0) (0.10.0) Requirement already satisfied: pyyaml in c:\programdata\anaconda3\lib\site-packages (from confuse>=1.0.0->pandas-profiling==2.12.0) (5.3.1) Requirement already satisfied: MarkupSafe>=0.23 in c:\programdata\anaconda3\lib\site-packages (from jinja2>=2.11.1->pandas-profiling==2.12.0) (1.1.1) Requirement already satisfied: networkx>=2.4 in c:\programdata\anaconda3\lib\site-packages (from visions[type_image_path]==0.6.0->pandas-profiling==2.12.0) (2.5) Requirement already satisfied: imagehash; extra == "type_image_path" in c:\programdata\anaconda3\lib\site-packages (from visions[type_image_path]==0.6.0->pandas-profiling==2.12.0) (4.2.0) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling==2.12.0) (1.25.11) Requirement already satisfied: idna<3,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling==2.12.0) (2.10) Requirement already satisfied: chardet<4,>=3.0.2 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling==2.12.0) (3.0.4) Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil>=2.7.3->pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling==2.12.0) (1.15.0) Requirement already satisfied: decorator>=4.3.0 in c:\programdata\anaconda3\lib\site-packages (from networkx>=2.4->visions[type_image_path]==0.6.0->pandas-profiling==2.12.0) (4.4.2) Requirement already satisfied: PyWavelets in c:\programdata\anaconda3\lib\site-packages (from imagehash; extra == "type_image_path"->visions[type_image_path]==0.6.0->pandas-profiling==2.12.0) (1.1.1) Building wheels for collected packages: pandas-profiling Building wheel for pandas-profiling (setup.py): started Building wheel for pandas-profiling (setup.py): finished with status 'done' Created wheel for pandas-profiling: filename=pandas_profiling-2.12.0-py2.py3-none-any.whl size=243837 sha256=ccd43c77025b54aafb585f83c17465bbd25c184e77ccc6af64722f58fe662d97 Stored in directory: C:\Users\Emmanuel\AppData\Local\Temp\pip-ephem-wheel-cache-j6stjli6\wheels\64\b6\85\dfc808b23666a5910371784e349d28818006ff63ed9cfeca59 Successfully built pandas-profiling
!pip install pandas-profiling
Requirement already satisfied: pandas-profiling in c:\programdata\anaconda3\lib\site-packages (2.12.0) Requirement already satisfied: seaborn>=0.10.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.11.0) Requirement already satisfied: missingno>=0.4.2 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.4.2) Requirement already satisfied: tangled-up-in-unicode>=0.0.6 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.0.7) Requirement already satisfied: confuse>=1.0.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (1.4.0) Requirement already satisfied: matplotlib>=3.2.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (3.3.2) Requirement already satisfied: jinja2>=2.11.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (2.11.2) Requirement already satisfied: visions[type_image_path]==0.6.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.6.0) Requirement already satisfied: requests>=2.24.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (2.24.0) Requirement already satisfied: tqdm>=4.48.2 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (4.50.2) Requirement already satisfied: phik>=0.10.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.11.2) Requirement already satisfied: pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (1.1.3) Requirement already satisfied: htmlmin>=0.1.12 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.1.12) Requirement already satisfied: joblib in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (0.17.0) Requirement already satisfied: attrs>=19.3.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (20.3.0) Requirement already satisfied: numpy>=1.16.0 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (1.19.2) Requirement already satisfied: scipy>=1.4.1 in c:\programdata\anaconda3\lib\site-packages (from pandas-profiling) (1.5.2) Requirement already satisfied: pyyaml in c:\programdata\anaconda3\lib\site-packages (from confuse>=1.0.0->pandas-profiling) (5.3.1) Requirement already satisfied: pillow>=6.2.0 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (8.0.1) Requirement already satisfied: kiwisolver>=1.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (1.3.0) Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (0.10.0) Requirement already satisfied: certifi>=2020.06.20 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (2020.6.20) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (2.4.7) Requirement already satisfied: python-dateutil>=2.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (2.8.1) Requirement already satisfied: MarkupSafe>=0.23 in c:\programdata\anaconda3\lib\site-packages (from jinja2>=2.11.1->pandas-profiling) (1.1.1) Requirement already satisfied: networkx>=2.4 in c:\programdata\anaconda3\lib\site-packages (from visions[type_image_path]==0.6.0->pandas-profiling) (2.5) Requirement already satisfied: imagehash; extra == "type_image_path" in c:\programdata\anaconda3\lib\site-packages (from visions[type_image_path]==0.6.0->pandas-profiling) (4.2.0) Requirement already satisfied: idna<3,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (1.25.11) Requirement already satisfied: chardet<4,>=3.0.2 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (3.0.4) Requirement already satisfied: pytz>=2017.2 in c:\programdata\anaconda3\lib\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling) (2020.1) Requirement already satisfied: six in c:\programdata\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib>=3.2.0->pandas-profiling) (1.15.0) Requirement already satisfied: decorator>=4.3.0 in c:\programdata\anaconda3\lib\site-packages (from networkx>=2.4->visions[type_image_path]==0.6.0->pandas-profiling) (4.4.2) Requirement already satisfied: PyWavelets in c:\programdata\anaconda3\lib\site-packages (from imagehash; extra == "type_image_path"->visions[type_image_path]==0.6.0->pandas-profiling) (1.1.1)
Once the package pandas-profiling is installed, I import the numpy, pandas, matplotlib and seaborn packages, as well as the object ProfileReport from the pandas_profiling package
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pandas_profiling import ProfileReport
The following command reads the csv file containing our data and assigns it to the variable object data. the head method is used to show part of the data.
data = pd.read_csv("CardioGoodFitness.csv")
data.head()
| Product | Age | Gender | Education | MaritalStatus | Usage | Fitness | Income | Miles | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | TM195 | 18 | Male | 14 | Single | 3 | 4 | 29562 | 112 |
| 1 | TM195 | 19 | Male | 15 | Single | 2 | 3 | 31836 | 75 |
| 2 | TM195 | 19 | Female | 14 | Partnered | 4 | 3 | 30699 | 66 |
| 3 | TM195 | 19 | Male | 12 | Single | 3 | 3 | 32973 | 85 |
| 4 | TM195 | 20 | Male | 13 | Partnered | 4 | 2 | 35247 | 47 |
The following command will give brief info about the dataset, including dataframe size and structure
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 180 entries, 0 to 179 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Product 180 non-null object 1 Age 180 non-null int64 2 Gender 180 non-null object 3 Education 180 non-null int64 4 MaritalStatus 180 non-null object 5 Usage 180 non-null int64 6 Fitness 180 non-null int64 7 Income 180 non-null int64 8 Miles 180 non-null int64 dtypes: int64(6), object(3) memory usage: 12.8+ KB
The following command will generate a report with the function ProfileReport from the pandas_profiling package and assigned to the object 'profile'
profile = ProfileReport(data, title="Cardio Good Fitness Profiling Report")
The following code uses the method to_widgets() on the object 'profile' to create some HTML reports of the variables.
The Overview tab gives a summary of the dataset with number of observations and variables
The Variables tab gives an univariate analysis of each variable.
The Correlations tab shows a correlation heatmap of the continuous variables.
The Missing values tab shows a bar plot for each variable and the freqency of missing values. There are no missing values in this dataset.
The Sample tab represents the head() and tail() of the dataset. (first and last 10 rows)
profile.to_widgets()
The first line of the following code create a list of numerical variables. the variable 'Fitness' can be used as categorical oir continous variables in regression for example, even if strictly speaking, it is an ordinal variable.
The second line of this code creates a 2*3 plot matrix of the numerical variables with the 'hist' method from the pandas package
numericals = ['Age', 'Income', 'Miles']
data[numericals].hist(bins=30, grid=False, figsize=(22, 6), layout=(1,3), color='#86bf91', zorder=2, rwidth=0.9, density=True);
The following shows boxplots of the dataset continuous variables with median, IQR and outliers.
boxprops = dict(linestyle='-', linewidth=2, color='#1F77B4')
medianprops = dict(linestyle='-', linewidth=2, color='#1F77B4')
whiskerprops = dict(linestyle='-', linewidth=2, color='#1F77B4')
capprops = dict(linestyle='-', linewidth=2, color='#1F77B4')
flierprops = dict(marker='o', markerfacecolor='#1F77B4', markersize=12, markeredgecolor='none')
data[numericals].plot(kind='box',
subplots = True,
figsize = (14,6),
layout = (1,3),
fontsize = 14,
boxprops=boxprops,
medianprops=medianprops,
whiskerprops=whiskerprops,
capprops=capprops,
flierprops=flierprops,
showmeans=True,
legend=True)
plt.subplots_adjust(wspace=0.35)
plt.suptitle("Outlier visualization (small circles) of all 3 continuous variables", fontsize=22)
plt.show();
Observation from boxplots: teh variables income and miles have lots of outliers, here as blue circles. If statistical modeling were to be plied such as multiple regression with the 'Miles' variable as the outcome variables, outliers need to be fixed with some imputation or deletion
The following code creates a list of categorical variables called 'categoricals', create subplots for each categorical variable and loop through each of them to create the actual plot with pandas plot function, with options for colors and xticks rotation
categoricals = ["Product", "Gender", "Education", "MaritalStatus", "Usage", "Fitness"]
fig, ax = plt.subplots(1, len(categoricals), figsize=(22,6))
for i, categorical in enumerate(data[categoricals]):
data[categorical].value_counts(sort=False).plot(kind='bar', ax=ax[i], color=['#ffa600', '#ff6361', '#bc5090', '#58508d', '#003f5c'], rot=0).set_title(categorical)
This section will include informative visualizations to answer some questions about the dataset.
Mosaic plots allow to visualize multivariate categorical data in an informative way. In this plot, I'm trying to look at how the elf rated fitness score of the customer relates to gender.
I first import the mosaic class from statsmodel and then draw a mosaic plot with color properties: red for Male and blue for Female
from statsmodels.graphics.mosaicplot import mosaic
props = lambda key: {'color': 'r' if 'Male' in key else 'b'}
mosaic(data.sort_values('Fitness'),
['Fitness', 'Gender'],
gap=0.05,
title='Self-Rated Fitness Scores and Gender Mosaic Plot',
properties=props,
labelizer=lambda k: '')
fig.tight_layout();
Observation from Mosaic Plot: The following plt show that about 50% of customers, males and females equaly, show a self rated fitness score of 3, meaning these customers rate themselves as quite fit. From those customer who rate themselves very fit (5), most are men. Looking at the psychology, womentend to be harder on themselves when it comes to fitness, so we need to keep this inmind.
The following code first finds the pairwise correlation of all columns in the dataframe. Any NA values are automatically excluded and any non-numeric data type columns (MaritalStatus) in the dataframe it is ignored.
A correlation heatmap is then created with the seabornfunction 'heatmap' with some added options to add correlation values.
corr = data.corr()
fig, ax = plt.subplots(1, 1, figsize=(22,8))
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
ax = sns.heatmap(
corr,
vmin=-1, vmax=1, center=0,
cmap=sns.diverging_palette(20, 220, n=200),
square=True,
annot=True,
linewidths=.5,
mask=mask
)
ax.set_xticklabels(
ax.get_xticklabels(),
rotation=45,
horizontalalignment='right'
)
fig.tight_layout();
Observation from Correlation Matrix Plot: From a statistical pointf view, a correlation of 0.7 and above is accepable, good or very good. From this correlation ot, we canee that the variable "Miles" has a fairly strong correlation with Usage (0.76) and strong correlation with the variable "Fitness". We can concludeorm this correlation that the higher a customer sef-rate, the morehe's running his/her miles.
On the other hand, the variable "Age" or no correlation with "Usage" (0.02), "Fitness" (0.06) and "Miles" (0.04), while showing a medium correlation with "Income" (r=0.51)
The following code creates a scatterplot to look at the relationship between Age, Miles, Gender and Marital Status
fig, ax = plt.subplots(1,1, figsize=(15,10))
sns.scatterplot(x="Age", y="Miles", data=data, hue="Gender", size="MaritalStatus", sizes=(50, 150), ax=ax)
ax.axhline(y=40, color='r', linewidth=1)
ax.axvline(x=18, color='r', linewidth=1)
ax.axhline(y=200, color='r', linewidth=1)
ax.axvline(x=35, color='r', linewidth=1)
ax.axvspan(xmin=18, xmax=35, ymin=0.1, ymax=0.525, color="green", alpha=0.1)
fig.tight_layout();
Observation from Scatterplot: this scatterplot shows "spatial" correlation (instead of a fixed number like above) between Age, les, Gender and Marital Status. We can see that the bulk of customers are between 18 and 35, runtween 45 and 200 miles, with females runningmostlup to 120 miles. the sizew of the dots with the green area shows lots of customers are single.
In this section, the data has been grouped by products to look at education, usage, fitness scores, income and miles by product. Then simple scatterplot is drawn.
products = data.groupby('Product')[['Education', 'Usage', 'Fitness', 'Income', 'Miles']].mean().round(0).reset_index()
products
| Product | Education | Usage | Fitness | Income | Miles | |
|---|---|---|---|---|---|---|
| 0 | TM195 | 15.0 | 3.0 | 3.0 | 46418.0 | 83.0 |
| 1 | TM498 | 15.0 | 3.0 | 3.0 | 48974.0 | 88.0 |
| 2 | TM798 | 17.0 | 5.0 | 5.0 | 75442.0 | 167.0 |
fig, ax = plt.subplots(1, 1, figsize=(10,8))
fig = sns.scatterplot(x="Product", y="Income", data=products, size="Education", sizes=(50, 250))
Observation from simple scatterplot: from the scatterpolot, the Product TM798 seems to attract a lot more affluent and educated people. We can assume that TM798 is the latest model therefore might be more expensive. And we know from demographics psychology that more educated people seem to have a higher salary.
The following code displays a simple scatterplot to look at income by Model number of treadmill used by customer.
fig, ax = plt.subplots(1, 1, figsize=(10,8))
g = sns.barplot(x="Product", y="Income", data=products)
for index, row in products.iterrows():
g.text(row.name,row.Income, int(row.Income), color='black', ha="center")
plt.title("Income by Model no. of Treadmill Used");
Observation from Barplot: this barplot shows that the product TM798 is bought by more affluent people. TM 798 might be the latest treadmill model and might be more expensive.
In this section, the data has been grouped by products as well as marital status, to look at age, education, usage, fitness scores, income and miles by product and marital status. Then categorical barplot is drawnusing the function 'catplot' from seaborn.
mstatus = data.groupby(['Product','MaritalStatus']).mean().round(0).reset_index()
mstatus
| Product | MaritalStatus | Age | Education | Usage | Fitness | Income | Miles | |
|---|---|---|---|---|---|---|---|---|
| 0 | TM195 | Partnered | 30.0 | 15.0 | 3.0 | 3.0 | 47849.0 | 77.0 |
| 1 | TM195 | Single | 27.0 | 15.0 | 3.0 | 3.0 | 44272.0 | 91.0 |
| 2 | TM498 | Partnered | 30.0 | 15.0 | 3.0 | 3.0 | 49523.0 | 90.0 |
| 3 | TM498 | Single | 27.0 | 15.0 | 3.0 | 3.0 | 48150.0 | 85.0 |
| 4 | TM798 | Partnered | 30.0 | 17.0 | 5.0 | 5.0 | 82047.0 | 183.0 |
| 5 | TM798 | Single | 28.0 | 17.0 | 5.0 | 5.0 | 66505.0 | 145.0 |
g = sns.catplot(x="Product", y="Income", hue="MaritalStatus", data=mstatus, height=5, aspect=1.2, kind="bar")
plt.title("Income by Model no. of Treadmill Used and Marital Status")
g.fig.set_size_inches(12,8)
ax = g.facet_axis(0,0)
for p in ax.patches:
ax.text(p.get_x() + 0.1,
p.get_height() * 1.02,
'{0:.1f}K'.format(p.get_height()/1000), # Used to format in K
color='black',
rotation='horizontal',
size='large')
plt.show(g)
Observation from Categorical Plot: Another important piece of information can be gleaned from this bar plot. The TM798 is purchased by e partnerd than single.
In the following section, I am going provide some recommandation to the owner Cardio Good Fitness club based on the data I explored.
In this first assignment, I explore the Cardio Good Fitness dataset by doing univariate and bivariate analyses. I then recommended solutions to potential onwer of the fitness store.